Detecting Incorrect Numerical Data in DBpedia

نویسندگان

  • Dominik Wienand
  • Heiko Paulheim
چکیده

DBpedia is a central hub of Linked Open Data (LOD). Being based on crowd-sourced contents and heuristic extraction methods, it is not free of errors. In this paper, we study the application of unsupervised numerical outlier detection methods to DBpedia, using Interquantile Range (IQR), Kernel Density Estimation (KDE), and various dispersion estimators, combined with different semantic grouping methods. Our approach reaches 87% precision, and has lead to the identification of 11 systematic errors in the DBpedia extraction framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting hidden errors in an ontology using contextual knowledge

Due to modeling errors in designing ontologies, an ontology may carry incorrect information. Ontology debugging can be helpful in detecting errors in ontologies that are increasing in size and expressiveness day by day. While current ontology debugging methods can detect logical errors (incoherences and inconsistencies), they are incapable of detecting hidden modeling errors in coherent and con...

متن کامل

Correcting Range Violation Errors in DBpedia

A range violation error is a problem when an object of a knowledge graph triple does not have a type required by the range of the triple’s predicate. This paper aims to correct these erroneous triples in DBpedia by finding correct objects with the required type to replace the incorrect objects. Our approach is based on graph analysis and keyword matching. It also exploits information from the i...

متن کامل

Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection

Outlier detection used for identifying wrong values in data is typically applied to single datasets to search them for values of unexpected behavior. In this work, we instead propose an approach which combines the outcomes of two independent outlier detection runs to get a more reliable result and to also prevent problems arising from natural outliers which are exceptional values in the dataset...

متن کامل

A comparison of complex correspondence detection techniques

One to one correspondences between entities are not always sufficient to describe the true relationship between related entities in diverse ontologies, and complex correspondences are needed instead. We demonstrate the types of complex correspondence occurring between two LOD sources and compare techniques for discovering these complex correspondences. 1 Motivation and Background Most alignment...

متن کامل

WhoKnows? Evaluating linked data heuristics with a quiz that cleans up DBpedia

Semantic technologies enable sophisticated search scenarios on educational video content. Linking Open Data (LOD) provides a vast amount of well structured semantic information in heterogenous domains. But, despite of the syntactically well expressed RDF facts, when authoring and publishing LOD, many inconsistencies may occur, especially if the data is generated with the help of automated metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014